Search CORE

Enhanced Failure Detection Mechanism in MapReduce

Author: Antoniu Gabriel
Memishi Bunjamin
Pérez Hernández María de los Santos
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2012
Field of study

The popularity of MapReduce programming model has increased interest in the research community for its improvement. Among the other directions, the point of fault tolerance, concretely the failure detection issue seems to be a crucial one, but that until now has not reached its satisfying level. Motivated by this, I decided to devote my main research during this period into having a prototype system architecture of MapReduce framework with a new failure detection service, containing both analytical (theoretical) and implementation part. I am confident that this work should lead the way for further contributions in detecting failures to any NoSQL App frameworks, and cloud storage systems in general

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

Archivo Digital UPM

GMonE: a complete approach to cloud monitoring

Author: Antoniu Gabriel
Memishi Bunjamin
Montes Jesús
Pérez Hernández María de los Santos
Sánchez Alberto
Publication venue: 'Elsevier BV'
Publication date: 01/01/2013
Field of study

The inherent complexity of modern cloud infrastructures has created the need for innovative monitoring approaches, as state-of-the-art solutions used for other large-scale environments do not address specific cloud features. Although cloud monitoring is nowadays an active research field, a comprehensive study covering all its aspects has not been presented yet. This paper provides a deep insight into cloud monitoring. It proposes a unified cloud monitoring taxonomy, based on which it defines a layered cloud monitoring architecture. To illustrate it, we have implemented GMonE, a general-purpose cloud monitoring tool which covers all aspects of cloud monitoring by specifically addressing the needs of modern cloud infrastructures. Furthermore, we have evaluated the performance, scalability and overhead of GMonE with Yahoo Cloud Serving Benchmark (YCSB), by using the OpenNebula cloud middleware on the Grid’5000 experimental testbed. The results of this evaluation demonstrate the benefits of our approach, surpassing the monitoring performance and capabilities of cloud monitoring alternatives such as those present in state-of-the-art systems such as Amazon EC2 and OpenNebula

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

Archivo Digital UPM

Cold storage data archives: More than just a bunch of tapes

Author: Memishi Bunjamin
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/07/2019
Field of study

EURECOM Repository

Feedback-Based Resource Allocation in MapReduce-Based Systems

Author: Antoniu Gabriel
Memishi Bunjamin
Pérez María
Publication venue: 'Hindawi Limited'
Publication date: 01/01/2016
Field of study

International audienceContainers are considered an optimized fine-grain alternative to virtual machines in cloud-based systems. Some of the approaches which have adopted the use of containers are the MapReduce frameworks. This paper makes an analysis of the use of containers in MapReduce-based systems, concluding that the resource utilization of these systems in terms of containers is suboptimal. In order to solve this, the paper describes AdaptCont, a proposal for optimizing the containers allocation in MapReduce systems. AdaptCont is based on the foundations of feedback systems. Two different selection approaches, Dynamic AdaptCont and Pool AdaptCont, are defined. Whereas Dynamic AdaptCont calculates the exact amount of resources per each container, Pool AdaptCont chooses a predefined container from a pool of available configurations. AdaptCont is evaluated for a particular case, the application master container of Hadoop YARN. As we can see in the evaluation, AdaptCont behaves much better than the default resource allocation mechanism of Hadoop YARN

Directory of Open Access Journals

HAL Descartes

Institute of Transport Research:Publications

Cold Storage Data Archives: More Than Just a Bunch of Tapes

Author: Appuswamy Raja
Memishi Bunjamin
Paradies Marcus
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2019
Field of study

Failure detector abstractions for MapReduce-based systems

Author: Antoniu Gabriel
Memishi Bunjamin
Pérez María
Publication venue: 'Elsevier BV'
Publication date: 01/01/2017
Field of study

International audienceOmission failures represent an important source of problems in data-intensive computing systems. In these frameworks, omission failures are caused by slow tasks, known as stragglers, which can strongly jeopardize the workload performance. In the case of MapReduce-based systems, many state-of-the-art approaches have preferred to explore and extend speculative execution mechanisms. Other alternatives have based their contributions in doubling the computing resources for their tasks. Nevertheless, none of these approaches has addressed a fundamental aspect related to the detection and further solving of the omission failures, that is, the timeout service adjustment.In this paper, we have studied the omission failures in MapReduce systems, formalizing their failure detector abstraction by means of three different algorithms for defining the timeout. The first abstraction, called High Relax Failure Detector (HR-FD), acts as a static alternative to the default timeout, which is able to estimate the completion time for the user workload. The second abstraction, called Medium Relax Failure Detector (MR-FD), dynamically modifies the timeout, according to the progress score of each workload. Finally, taking into account that some of the user requests are strictly deadline-bounded, we have introduced the third abstraction, called Low Relax Failure Detector (LR-FD), which is able to merge the MapReduce dynamic timeout with an external monitoring system, in order to enforce more accurate failure detections.Whereas HR-FD shows performance improvements for most of the user request (in particular, small workloads), MR-FD and LR-FD enhance significantly the current timeout selection, for any kind of scenario, regardless of the workload type and failure injection time

HAL Descartes

Diarchy: An Optimized Management Approach for MapReduce Masters

Author: Antoniu Gabriel
Memishi Bunjamin
Pérez-Hernández María S.
Publication venue: 'Elsevier BV'
Publication date: 01/01/2015
Field of study

International audienceThe MapReduce community is progressively replacing the classic Hadoop with Yarn, the second-generation Hadoop (MapReduce 2.0). This transition is being made due to many reasons, but primarily because of some scalability drawbacks of the classic Hadoop. The new framework has appropriately addressed this issue and is being praised for its multi-functionality. In this paper we carry out a probabilistic analysis that emphasizes some reliability concerns of Yarn at the job master level. This is a critical point, since the failures of a job master involves the failure of all the workers managed by such a master. In this paper, we propose Diarchy, a novel system for the management of job masters. Its aim is to increase the reliability of Yarn, based on the sharing and backup of responsibilities between two masters working as peers. The evaluation results show that Diarchy outperforms the reliability performance of Yarn in different setups, regardless of cluster size, type of job, or average failure rate and suggest a positive impact of this approach compared to the traditional, single-master Hadoop architecture

Elsevier - Publisher Connector